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1 Prediction caches for superscalar processors 
James E. Bennett, Michael J. Flynn 

December 1997 Proceedings of the 30th annual ACM/IEEE international symposium on 

Microarchitecture 

" Full text available:^ rf| 

H| p (1.02 MB )^ Additional Information: full citation , abstract , references , index terms 
Publisher Site 

Processor cycle times are currently much faster than memory cycle times, and this gap 
continues to increase. Adding a high speed cache memory allows the processor to run at full 
speed, as long as the data it needs is present in the cache. However, memory latency still 
affects performance in the case of a cache miss. Prediction caches use a history of recent 
cache misses to predict future misses and to reduce the overall cache miss rate. This paper 
describes several prediction caches, and introdu ... 



Keywords: Dynamic scheduling, Memory latency, Stream buffer, Victim cache, Prediction 
cache 



2 Predictive cachin g strategy for on-demand routing protocols in wireless ad hoc 
networks 

Wenjing Lou, Yuguang Fang 

November 2002 Wireless Networks, volume 8 issue 6 

Full text available: I pl pdf(221.65 KB) Additional Information: full citation , abstract , references , citings , index 
L£3 ~^ terms 

Route caching strategy is important in on-demand routing protocols in wireless ad hoc 
networks. While high routing overhead usually has a significant performance impact in low 
bandwidth wireless networks, a good route caching strategy can reduce routing overheads 
by making use of the available route information more efficiently. In this paper, we first 
study the effects of two cache schemes, "link cache" and "path cache", on the performance 
of on-demand routing protocols through simulations base ... 

Keywords: ad hoc networks, dynamic source routing, on-demand routing, timeout 
mechanism 
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March 2003 The Journal of Machine Learning Research, volume 3 

Full text available:^ pdf(5.52 MB) Additional Information: full citation , abstract , index terms 

A single signal processing algorithm can be represented by many mathematically equivalent 
formulas. However, when these formulas are implemented in code and run on real 
machines, they have very different runtimes. Unfortunately, it is extremely difficult to 
model this broad performance range. Further, the space of formulas for real signal 
transforms is so large that it is impossible to search it exhaustively for fast 
implementations. We approach this search question as a control learning problem ... 

4 Way-predicting set-associative cache for high performance and low energy 
consumption 

Koji Inoue, Tohru Ishihara, Kazuaki Murakami 

August 1999 Proceedings of the 1999 international symposium on Low power 
electronics and design 

Full text available: ^ | pdf(375. 17 KB) Additional Information: full citation , references , citings , index terms 



5 Memory hierarchies: Direct load: dependence-linked dataflow resolution of load 
address and cache coordinate 

Byung-Kwon Chung, Jinsuo Zhang, Jih-Kwon Peir, Shih-Chang Lai, Konrad Lai 
December 2001 Proceedings of the 34th annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available: = ryioxSll 

]^ PO'( 1 38 MB) ^ Additional Information: full citation , abstract , references , citings 
Publisher Site 

An increasing cache latency in future processors incurs profound performance impacts in 
spite of advanced out-of-order execution techniques. In this paper, we describe an early 
address resolution mechanism that accurately resolves both regular and irregular load 
addresses. The basic idea is to build dynamic dependence links from the instruction that 
updates the base register to the consumer load instructions. Once a new base address is 
available, it triggers calculations of the new load addresse ... 

6 Simultaneous subordinate microthreading (SSMT) 

Robert S. Chappell, Jared Stark, Sangwook P. Kim, Steven K. Reinhardt, Yale N. Patt 
May 1999 ACM SIGARCH Computer Architecture News , Proceedings of the 26th 

annual international symposium on Computer architecture, volume 27 issue 2 
Full text available: g|pdf(129.54 KB) Additional Information: full citation , abstract , references , citings , index 
W Publisher Site 

Current work in Simultaneous Multithreading provides little benefit to programs that aren't 
partitioned into threads. We propose Simultaneous Subordinate Microthreading (SSMT) to 
correct this by spawning subordinate threads that perform optimizations on behalf of the 
single primary thread. These threads, written in microcode, are issued and executed 
concurrently with the primary thread. They directly manipulate the microarchitecture to 
improve the primary thread's branch prediction accuracy, cac ... 

7 Embedded systems: applications, solutions and techniques (EMBS): An energy 
efficient cache memory architecture for embedded systems 

Park Jung-Wook, Kim Cheong-Ghil, Lee Jung-Hoon, Kim Shin-Dug 

March 2004 Proceedings of the 2004 ACM symposium on Applied computing 

Full text available: ■B pdff277.70 KB) Additional Information: full citation , abstract, references , index terms , 
^ review 

This paper proposes a modified two-way set associative cache for embedded systems to 
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reduce the energy consumption. For this goal, the proposed cache, called SSA (selective- 
way-access skewed associative) cache, equips with a way-selecting mechanism controlled 
by skewing function and small table look-up, which also has the reconfigurable ability to be 
converted to one direct mapped cache on a specific application. The skewing mechanism 
including differentiated mapping function for each cache set, ... 

Keywords: embedded system, low power cache, memory hierarchy, selective way access, 
skewed associativity 



Difficult-path branch prediction using subordinate microthreads | 

Robert S. Chappell, Francis Tseng, Adi Yoaz, Yale N. Patt 

May 2002 ACM SIGARCH Computer Architecture News, volume 30 issue 2 

Full text available:^ p^f^ 14 mb) ( jl Additional Information: full citation , abstract , references , citings , index 
Publisher Site 

Branch misprediction penalties continue to increase as microprocessor cores become wider 
and deeper. Thus, improving branch prediction accuracy remains an important challenge. 
Simultaneous Subordinate Microthreading (SSMT) provides a means to improve branch 
prediction accuracy. SSMT machines run multiple, concurrent microthreads in support of the 
primary thread. We propose to dynamically construct microthreads that can speculatively 
and accurately pre-compute branch outcomes along frequently mis ... 

Keywords: high performance microprocessor, branch prediction, SSMT, SMT, helper 
thread, microarchitecture, microthread 



9 Information Retrieval: Predictive caching and prefetching of query results in search 
engines 

Ronny Lempel, Shlomo Moran 

May 2003 Proceedings of the 12th international conference on World Wide Web 

Full text available* fiBl pdf(2 1 2 73 KB) Additional Information: full citation , abstract , references , citings , index 
• y^u—x : terms 

We study the caching of query result pages in Web search engines. Popular search engines 
receive millions of queries per day, and efficient policies for caching query results may 
enable them to lower their response time and reduce their hardware requirements. We 
present PDC (probability driven cache), a novel scheme tailored for caching search results, 
that is based on a probabilistic model of search engine users. We then use a trace of over 
seven million queries submitted to the search engine A ... 

Keywords: caching, query processing and optimization 



10 Caching: Efficient prediction of web accesses on a proxy server 
Wenwu Lou, Hongjun Lu 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Full text available:^ pdf (350.55 KB) Additional Information: full citation , abstract , references , index terms 

Web access prediction is an active research topic with many applications. Various 
approaches have been proposed for Web access prediction in the domain of individual Web 
servers but they have to be tailored to the domain of proxy servers to satisfy its special 
requirements in prediction efficiency and scalability. In this paper, the design and 
implementation of proxy-based prediction service (PPS) is presented. For prediction 
efficiency, PPS applies a new prediction scheme which employs a two-la ... 
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11 Mining web logs for prediction models in WWW caching and prefetching 
Qiang Yang, Haining Henry Zhang, Tianyi Li 

August 2001 Proceedings of the seventh ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: -B odf(413.83 KB) Additional Information: full citation, abstract, references , citings, index 
^ terms 

Web caching and prefetching are well known strategies for improving the performance of 
Internet systems. When combined with web log mining, these strategies can decide to 
cache and prefetch web documents with higher accuracy. In this paper, we present an 
application of web log mining to obtain web-document access patterns and use these 
patterns to extend the well-known GDSF caching policies and prefetching policies. Using 
real web logs, we show that this application of data mining can achieve dr ... 

Keywords: Application to Caching and Prefetching on the WWW, Web Log Mining 



12 Improving CISC instruction decoding performance using a fill unit Q 
Mark Smotherman, Manoj Franklin 

December 1995 Proceedings of the 28th annual international symposium on 
M i c roa rc h i t ect u re 

Full text available: ^ pdf(965.34 KB) Additional Information: full citation , references , citings , index terms 



13 Hardware-driven prefetching for pointer data references 
Chi-Hung Chi, Chin-Ming Cheung 

July 1998 Proceedings of the 12th international conference on Supercomputing 

Full text available: ^| pdf(1.06 MB) Additional Information: full citation , references , citings , index terms 



14 The cache performance and optimizations of blocked algorithms 
Monica D. Lam, Edward E. Rothberg, Michael E. Wolf 

April 1991 Proceedings of the fourth international conference on Architectural support 
for programming languages and operating systems, volume 19 , 25 , 26 issue 2 , 
Special Issue , 4 

Full text available: 1|| pdf(1.20 MB) Additional Information: full citation , references , citings, index terms 



Coupling compiler-enabled and conventional memory accessing for energy efficiency Q 
Raksit Ashok, Saurabh Chheda, Csaba Andras Moritz 

May 2004 ACM Transactions on Computer Systems (TOCS), volume 22 issue 2 

Full text available: ^pdf(1.41 MB) Additional Information: full citation , abstract , references , index terms 

This article presents Cool-Mem, a family of memory system architectures that integrate 
conventional memory system mechanisms, energy-aware address translation, and 
compiler-enabled cache disambiguation techniques, to reduce energy consumption in 
general-purpose architectures. The solutions provided in this article leverage on interlayer 
tradeoffs between architecture, compiler, and operating system layers. Cool-Mem achieves 
power reduction by statically matching memory operations with energy-eff ... 
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16 Data cache locking for higher program predictability 
Xavier Vera, Bjorn Lisper, Jingling Xue 

June 2003 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
2003 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems, volume 31 issue i 

Full text available- IB odf(292 01 KB) Additional Information: full citation, abstract, references , citings, index 
' ™ v '~ 1 terms 

Caches have become increasingly important with the widening gap between main memory 
and processor speeds. However, they are a source of unpredictability due to their 
characteristics, resulting in programs behaving in a different way than expected. Cache 
locking mechanisms adapt caches to the needs of real-time systems. Locking the cache is a 
solution that trades performance for predictability: at a cost of generally lower 
performance, the time of accessing the memory becomes predictable.This pape ... 

Keywords: data cache analysis, worst-case execution time 



17 Session 5: An adaptive serial-parallel CAM architecture for low-power cache blocks Q 
Aristides Efthymiou, Jim D. Garside 

August 2002 Proceedings of the 2002 international symposium on Low power 
electronics and design 

Full text available: 1 j pdf(151.38 KB) Additional Information: full citation , abstract, references , dtjngs, index 

terms 

There is an on-going debate about which consumes less energy: a RAM-tagged associative 
cache with an intelligent order of accessing its tags and ways (e.g. way prediction), or a 
CAM-tagged high associativity cache. If a CAM search can consume less than twice the 
energy of reading a tag RAM, it would probably be the preferred option for low-power 
applications. Based on memory traces — which usually cause tag mismatch within the 
lower four bits — a new serial CAM organisation is proposed which ... 

Keywords: CAM, VLSI, asynchronous circuits, cache design, low energy, low power 

18 Timing analysis and memory optimization for embedded systems: Associative caches Q 
in formal software timing analysis 

Fabian Wolf, Jan Staschulat, Rolf Ernst 

June 2002 Proceedings of the 39th conference on Design automation 

Full text available* S pdf(222 32 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

Precise cache analysis is crucial to formally determine program running time. As cache 
simulation is unsafe with respect to the conservative running time bounds for real-time 
systems, current cache analysis techniques combine basic block level cache modeling with 
explicit or implicit program path analysis. We present an approach that extends instruction 
and data cache modeling from the granularity of basic blocks to program segments thereby 
increasing the overall running time analysis precision. ... 

Keywords: cache analysis, embedded software, real-time, timing analysis 
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Dorota M. Huizinga, Patrick Mann 
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February 1996 Proceedings of the 1996 ACM symposium on Applied Computing 

Full text available: ^ pdf(857.48 KB) Additional Information: full citation , references , citings , index terms 

Keywords: disconnected operation, mobile computing 
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Chi-Hung Chi, Jun-Li Yuan, Chin-Ming Cheung 

May 1999 Proceedings of the 13th international conference on Supercomputing 
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